
REMARKS 


Claims 1, 2 and 7 have been amended to broaden the claims. 

In response to the rejection of claims 1-25 under 35 USC 112, first 
paragraph, alleging that "the determination of direction dependent at 
least at times on the video signals" was not described in the 
specification, applicant traverses the rejection because the use of 
video signals to determine direction is well supported in the 
specification. 

In the specification at page 2, lines 8-9 "Computer vision 
algorithms are used to detect, locate, and track people in the field of 
view of a wide-angle, stationary video camera." Also, in the 
specification at page 2, lines 19-20 "this approach allows the video 
conferencing system to accurately track moving people regardless of 
whether they speak or not". Also, at page 3, lines 2-3 "a multimodal 
integration architecture system : for. processing said image signals and 
said audio signals to determine a direction of the audio source relative 
to a reference point". Thus in the invention herein, the determination 
of the direction of. a speaker is .dependent on the video signals at least 
when the speaker, is : not speaking. 

In addition, the documents that are incorporated herein by 
reference include detailed discussion of determining the direction of an 
audio source depending on video signals; For example, in WO 99/60788 on 
page 8 "A moving speaker, such as one giving a presentation, can be 
tracked by tracking his image." 

In response to the rejection of claims 1-3, 5-8, 10-25 under 35 USC 
102(b), allegedly for being anticipated by US5686957 to Baker, the 
citation does not identically disclose, every element of the claims. 

More specifically, Baker does . not suggest, "processing such image 
signals ... to determine the direction of the source relative to a 
reference point" as in claims .1 and : 10 . and 25.- Also, Baker does not 
suggest, "processing the video signals .... to determine a direction to the 
speaker relative to a reference point" as in claim 16. All the other 
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rejected claims are dependent on claims 1, 10, 16 or 25 and are thus, 
allowable for at least the same reasons as those claims. 

In response to the rejection of claim 4 under 35 USC 103(a) for 
allegedly being unpatentable over Baker, Baker does not suggest the 
elements of claim 4. Baker does not suggest "processing such image 
signals ... to determine the direction of the source relative to a , 
reference point 7 ' as in claim 1 on which claim 4 depends. Also, Baker 
does not suggest, "an integrated housing ... incorporating the image . 
pickup device, the audio pickup device, and the multimodal integration 
architecture system" as in claim 3 on which claim 4 depends. Baker 
teaches away from an integrated housing because in Baker at column 9, 
lines 15-20, the "invention is comprised of four microphones spaced 
apart which would be arranged concentrically about the lens and camera 
on a conference room table so that all. the participants in the 
conference have audio access to the microphones for transmission of 
sound". It is necessary in Baker to space the microphones apart because 
the only method proposed : by Baker to. determine the direction to the 
speaker is at column 9, t lines 29-32 , "each microphone input would be 
sampled to determine which has the largest amplitude of signals or which 
one has signals, to determine the specific direction for steering the 
video camera". The apparatus disclosed, by Baker. would not work if it 
were integrated, into, a portable unit because then all the microphones 
would receive about- the same amplitude. 

In response to the rejection of claim 9 under 35 USC 103(a) for 
allegedly being unpatentable over Baker in view of US patent 5,778,082 
to Chu, the combination does not suggest "processing such image signals 
... to determine the direction of the .source relative to a reference 
point" as in claim .1 on which- claim 9. depends. The examiner's statement 
that "Baker differs from. the claimed invention .in not disclosing the use 
of an. array of two microphones is clearly erroneous. The term 
"comprised" is used in claims to introduce an open list so that 
"comprised of an array of - two microphones'' clearly reads on any array of 
two or more microphones such as .the. array of .4 microphones of Baker. 


The claims are definite and distinguished from the citations and 
Applicant respectfully requests the allowance of all claims. 


Respectfully submitted, 
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